Latest Update: 2021-12-09 20:59:55

Introduction

Heart disease is the one of the leading cause of death for men, women, and people of different races in the United States. The term “heart disease” includes several types of heart conditions. In the United States, the most common type of heart disease is coronary artery disease (CAD), which can lead to heart attack. In fact, one person dies every 36 seconds in the United States from cardiovascular disease.About 659,000 people in the United States die from heart disease each year, which is in every 4 deaths. Heart disease costs the United States a lot each year, which includes the cost of health care services, medicines, and lost productivity due to death. It would be helpful to examine the pattern of heart disease mortality rate under gender and race stratification to understand the behind association to some extent.

Data Background

The data was obtained from CDC chronic disease and health promotion data & indicators: https://chronicdata.cdc.gov/Heart-Disease-Stroke-Prevention/Heart-Disease-Mortality-Data-Among-US-Adults-35-by/i2vk-mgdh, and it was organized by National Vital Statistics System and focused on heart disease mortality data in US during 2014. The data was collected in county level. Here were the basic information of this dataset:

  • 2013 to 2015, 3-year average. Rates are age-standardized. County rates are spatially smoothed. The data can be viewed by gender and race/ethnicity. Data source: National Vital Statistics System. Additional data, maps, and methodology can be viewed on the Interactive Atlas of Heart Disease and Stroke http://www.cdc.gov/dhdsp/maps/atlas

Since it might be too vague to observe the mortality rate from the entire country level, a focus on California data is included in this investigation. The FIPS code of each county was obtained from https://raw.githubusercontent.com/kjhealy/fips-codes/master/state_and_county_fips_master.csv.
The main question is: How were gender and races associated with heart disease death rate in California during 2014?
To better analyze the question step-by-step, several sub-questions are added

  • What was the distribution of heart disease death rate under gender and race stratification?
  • Which county had relatively higher mortality rate gap between gender groups?
  • How was the mortality rate distributed in each county under gender and race classification?

Methods

The data set included the following important variables which will be used later:

  • LocationDesc: county name
  • Data_Value: heart disease death rate per 100,000 population
  • Stratification1: gender categories (Male, Female)
  • Stratification2: race categories (White, Black Hispanic, Asian and Pacific Islander, American Indian and Alaskan Native)
  • FIPS code: used for locating county

Read In Data

The original data source was included in the Method part.

heartdisease <- "/Users/hms/Desktop/PM566-project/Final/Data/Heart_Disease_Mortality_Data_Among_US_Adults__35___by_State_Territory_and_County.csv"
heartdisease <- data.table::fread(heartdisease)

Data Wrangling

The data set was cleaned according to EDA checklists. There were several “NAs” under “Data_value” variable, which was due to insufficient information. It was replaced by median value of mortality rate, which was used for later data visualization. Since the data set included data from entire country, only data in California was selected for analysis. After data wrangling, datasets CA_race and CA_gender were created and each of them was merged with FIPS code, whose source was introduced in the Introduction session.

The gender data included variables: county name as LocationDesc, male mortality rate as value_male, female mortality rate as value_female and FIPS code for each county as fips.

The race data included variables: county name as LocationDesc, the White mortality rate as value_white, the Hispanic mortality rate as value_hispanic, the Black mortality rate as value_black, the Asian and Pacific Islander mortality rate as value_asian_pacific, the American Indian and Alaskan Native mortality rate as value_indian_alaskan and FIPS code for each county as fips.

# select data in California
heartdisease_CA <- heartdisease[LocationAbbr == 'CA' & GeographicLevel == 'County']
# convert data into num format
heartdisease_CA$Data_Value <- as.numeric(heartdisease_CA$Data_Value)
# select data under each stratification
CA_gender <- heartdisease_CA[Stratification1 != 'Overall' & Stratification2 == 'Overall']
CA_gender$Data_Value[is.na(CA_gender$Data_Value)] <- median(CA_gender$Data_Value, na.rm = TRUE)                                          
CA_race <- heartdisease_CA[Stratification2 != 'Overall' & Stratification1 == 'Overall']
CA_race$Data_Value[is.na(CA_race$Data_Value)] <- median(CA_race$Data_Value, na.rm = TRUE) 

# gender data handling
CA_male <- CA_gender[Stratification1 == 'Male']%>% 
  select(LocationDesc, Data_Value)%>%
  rename(value_male = Data_Value)
CA_female <- CA_gender[Stratification1 == 'Female'] %>% 
  select(LocationDesc, Data_Value)%>%
  rename(value_female = Data_Value)
gender_joint <- merge(CA_male, CA_female, by.x = "LocationDesc", 
             by.y = "LocationDesc", all.x = TRUE, all.y = FALSE)
gender_joint$Gap <- (gender_joint$value_male - gender_joint$value_female)

df <- read.csv('https://raw.githubusercontent.com/kjhealy/fips-codes/master/state_and_county_fips_master.csv')
fips <- filter(df,state == "CA")

CA_gender1 <- merge(gender_joint, fips, by.x = "LocationDesc", 
             by.y = "name", all.x = TRUE, all.y = FALSE)
CA_gender1 <- CA_gender1 %>% 
  mutate(fips = ifelse(row_number()>= 1,paste0("0", fips)))
url <- 'https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json'
counties <- rjson::fromJSON(file=url)

# Race data handling
CA_white <- CA_race[Stratification2 == 'White']%>% 
  select(LocationDesc, Data_Value)%>% 
  rename(value_white = Data_Value)
CA_hispanic <- CA_race[Stratification2 == 'Hispanic'] %>% 
  select(LocationDesc, Data_Value) %>%
  rename(value_hispanic = Data_Value)
CA_black <- CA_race[Stratification2 == 'Black'] %>% 
  select(LocationDesc, Data_Value) %>%
  rename(value_black = Data_Value, )
CA_asian_pacific <- CA_race[Stratification2 == 'Asian and Pacific Islander'] %>% 
  select(LocationDesc, Data_Value) %>%
  rename(value_asian_pacific = Data_Value)
CA_indian_alaskan <- CA_race[Stratification2 == 'American Indian and Alaskan Native']%>% 
  select(LocationDesc, Data_Value) %>%
  rename(value_indian_alaskan = Data_Value)
data_list <- list(CA_white, CA_hispanic, CA_black, CA_asian_pacific, CA_indian_alaskan) 
CA_race1 <- data_list %>% reduce(inner_join, by = "LocationDesc")

CA_race1 <- merge(CA_race1, fips, by.x = "LocationDesc", 
             by.y = "name", all.x = TRUE, all.y = FALSE)
CA_race1 <- CA_race1 %>% 
  mutate(fips = ifelse(row_number()>= 1,paste0("0", fips)))

Preliminary Results

The visualization of data was included in this part which replied to the sub-questions above. The results corresponded to the general patterns of stratification, differences within stratification and geographical distributions.

Mortality rate distribution within sex and race stratification in CA

  • Use dataset CA_gender and CA_race to create distribution graphs for two stratifications
  • Apply ggplotly to obtain the interactive visualization
# create distribution graph graph to find association between gender and death rate
p1 <- ggplot(CA_gender, aes(Data_Value, fill = Stratification1))+ 
     geom_density(alpha = 0.5) +
     scale_fill_brewer(palette = "Set3") +
     labs(
      x = "death rate per 100,000 population",
      y = "Density",
      title = "Distribution of death rate by gender in CA")
p1 <- ggplotly(p1)

# create distribution graph to find association between race and death rate
p2 <- ggplot(CA_race, aes(Data_Value, fill = Stratification2))+ 
     geom_density(alpha = 0.5) +
     scale_fill_brewer(palette = "Set3") +
     labs(
      x = "death rate per 100,000 population",
      y = "Density",
      title = "Distribution of death rate by race in CA")
p2 <- ggplotly(p2)

Mortality rate under sex stratification in CA during 2014

Mortality rate under race stratification in CA during 2014

For graph of death rates under gender stratification, the distribution of male group was on the right compared to female group in general, which indicated that the male had relatively higher heart disease death rate than the female.

For graph of death rates under race stratification, the distribution of the White, the Black and the American Indian and Alaskan Native were on the right compared to the Hispanic and the Asian and Pacific Islander. It indicated that the White, the Black and the American Indian and Alaskan Native had relatively higher heart disease death rate than the Hispanic and the Asian and Pacific Islander.

Mortality Rate Gap Between Male group and Female group in CA

  • Create mortality rate gap variable named as Gap in dataset gender_joint
  • Visualize the magnitude of Gap by bubble charts via plotly for each county
  • There should be 58 counties in total.
fig_gendergap <- plot_ly(gender_joint, x = ~value_male, y = ~value_female, type = 'scatter', mode = 'markers',size = ~Gap, color = ~LocationDesc, colors = 'Paired',
        sizes = c(5, 45),
        marker = list(opacity = 0.5, sizemode = 'diameter'),
         text=~paste(paste("County: ", LocationDesc),
                     paste("Death rate/100,000(male):", value_male),
                     paste("Death rate/100,000(female):", value_female),
                     paste("Death rate/100,000(gap):", Gap),
                      sep="<br>"),hoverinfo="text")
fig_gendergap <- fig_gendergap %>% 
  layout(title = 'Gender Gap on heart disease death rate among CA county',
         xaxis = list(title = 'Mortality rate/100,000 population (male)', showgrid = FALSE),
         yaxis = list(title = 'Mortality rate/100,000 population (female)', showgrid = FALSE))

fig_gendergap

In the bubble chart, the mortality rate for male group lied on the x-axis and the mortality rate for female group lied on the y-axis. The differences between rates was shown via the size of bubble. The label of each bubble included mortality rate gaps and corresponding county names. From the chart, most counties exhibited obvious gaps between mortality rates, which was consistent as the results in gender distribution graph above. Among all counties, Butte County(208/100,000 population), Plumas County(206.8/100,000 population) and Yuba County(220.4/100,000 population) seemed to have greatest differences.

Mortality Rate Pattern in each county in CA (gender stratification)

  • Create county-level mortality rate map via plotly to show gradient level of mortality rate under gender stratification
  • Place California in the center of Choropleth map
## Pattern map for male group
fig_male <- plot_ly( text=~paste(paste("County: ", CA_gender1$LocationDesc),
                      paste("Death rate/100,000:", CA_gender1$value_male),
                      sep="<br>"),hoverinfo="text")
fig_male <- fig_male %>% add_trace(
    type="choroplethmapbox",
    geojson = counties,
    locations = CA_gender1$fips,
    z = CA_gender1$value_male,
    colorscale="Cividis",
    reversescale = TRUE,
    zmin = 150,
    zmax = 500,
    marker=list(line=list(
      width=0),
      opacity=0.5))%>% 
  layout(
    mapbox=list(
      style="carto-positron",
      zoom =4,
      center=list(lon= -119.42, lat=36.78)))

fig_female <- plot_ly( text=~paste(paste("County: ", CA_gender1$LocationDesc),
                      paste("Death rate/100,000:", CA_gender1$value_female),
                      sep="<br>"),hoverinfo="text")
fig_female <- fig_female %>% add_trace(
    type="choroplethmapbox",
    geojson = counties,
    locations = CA_gender1$fips,
    z = CA_gender1$value_female,
    colorscale="Cividis",
    reversescale=TRUE,
    zmin = 150,
    zmax = 500,
    marker=list(line=list(
      width=0),
      opacity=0.5))%>% 
  layout(
    mapbox=list(
      style="carto-positron",
      zoom =4,
      center=list(lon= -119.42, lat=36.78)))

Male group mortality rate in CA during 2014

Female group mortality rate in CA during 2014

Based on the map, the male group had darker color compared to the female in each county in general, which was the same as conclusions above.

  • Among all the counties for the male, the mortality rates of Kern County(476.9/100,000 population), Yuba County(506.1/100,000 population), Lake County(473.5/100,000 population) and Stanislaus County(469.4/100,000 population) were relatively higher.
  • For the female, the mortality rates of Tulare County(303.3/100,000 population), Kern County(319.7/100,000 population) and Stanislaus County(315.4/100,000 population) were relatively higher.
    From geographical perspective, the county along the coast had relatively lower heart disease death rate for both male and female groups.

Mortality Rate Pattern in Each County in CA During 2014(Race Stratification)

  • Create county-level mortality rate map via plotly to show gradient level of mortality rate under race stratification
  • Place California in the center of Choropleth map
## Pattern map for White group
fig_white <- plot_ly(
  text=~paste(paste("County: ", CA_race1$LocationDesc),
                      paste("Death rate/100,000:", CA_race1$value_white),
                      sep="<br>"),hoverinfo="text")
fig_white <- fig_white %>% 
  add_trace(
    type="choroplethmapbox",
    geojson = counties,
    locations = CA_race1$fips,
    z = CA_race1$value_white,
    colorscale="Cividis",
    reversescale = TRUE,
    zmin = 150,
    zmax = 500,
    marker=list(line=list(
      width=0),
      opacity=0.5)) %>% 
  layout(
    mapbox=list(
      style="carto-positron",
      zoom =4,
      center=list(lon= -119.42, lat=36.78)))

# Pattern map for Hispanic group
fig_hispanic <- plot_ly(
  text=~paste(paste("County: ", CA_race1$LocationDesc),
                      paste("Death rate/100,000:", CA_race1$value_hispanic),
                      sep="<br>"),hoverinfo="text")
fig_hispanic <- fig_hispanic %>% add_trace(
    type="choroplethmapbox",
    geojson = counties,
    locations = CA_race1$fips,
    z = CA_race1$value_hispanic,
    colorscale="Cividis",
    reversescale = TRUE,
    zmin = 150,
    zmax = 500,
    marker=list(line=list(
      width=0),
      opacity=0.5))%>% 
  layout(
    mapbox=list(
      style="carto-positron",
      zoom =4,
      center=list(lon= -119.42, lat=36.78)))

# Pattern map for Black group
fig_black <- plot_ly(text=~paste(paste("County: ", CA_race1$LocationDesc),
                      paste("Death rate/100,000:", CA_race1$value_black),
                      sep="<br>"),hoverinfo="text")
fig_black <- fig_black %>% add_trace(
    type="choroplethmapbox",
    geojson = counties,
    locations = CA_race1$fips,
    z = CA_race1$value_black,
    colorscale="Cividis",
    reversescale = TRUE,
    zmin = 150,
    zmax = 500,
    marker=list(line=list(
      width=0),
      opacity=0.5))%>% 
  layout(
    mapbox=list(
      style="carto-positron",
      zoom =4,
      center=list(lon= -119.42, lat=36.78)))

# Pattern map for Asian and Pacific Islander group
fig_asian_pacific <- plot_ly(text=~paste(paste("County: ", CA_race1$LocationDesc),
                      paste("Death rate/100,000:", CA_race1$value_asian_pacific),
                      sep="<br>"),hoverinfo="text")
fig_asian_pacific <- fig_asian_pacific %>% add_trace(
    type="choroplethmapbox",
    geojson = counties,
    locations = CA_race1$fips,
    z = CA_race1$value_asian_pacific,
    colorscale="Cividis",
    reversescale = TRUE,
    zmin = 150,
    zmax = 500,
    marker=list(line=list(
      width=0),
      opacity=0.5))%>% 
  layout(
    mapbox=list(
      style="carto-positron",
      zoom =4,
      center=list(lon= -119.42, lat=36.78)))

fig_indian_alaskan <- plot_ly(text=~paste(paste("County: ", CA_race1$LocationDesc),
                      paste("Death rate/100,000:", CA_race1$value_indian_alaskan),
                      sep="<br>"),hoverinfo="text")
fig_indian_alaskan <- fig_indian_alaskan %>% add_trace(
    type="choroplethmapbox",
    geojson = counties,
    locations = CA_race1$fips,
    z = CA_race1$value_indian_alaskan,
    colorscale="Cividis",
    reversescale = TRUE,
    zmin = 150,
    zmax = 500,
    marker=list(line=list(
      width=0),
      opacity=0.5))%>% 
  layout(
    mapbox=list(
      style="carto-positron",
      zoom =4,
      center=list(lon= -119.42, lat=36.78)))

White

Hispanic

Black

Asian and Pacific Islander

American Indian and Alaskan Native

Based on the map, the White, the Black and the American Indian and Alaskan Native had darker color compared to the Hispanic and the Asian and Pacific Islander in each county in general, which was the same as conclusions before.

  • Among all the counties for the White, the mortality rates of Kern County(438.6/100,000 population), Yuba County(447.3/100,000 population), and Stanislaus County(436.7/100,000 population) were relatively higher.
  • For the Hispanic group, the mortality rate of Kern County(296.2/100,000 population) was relatively higher.
  • For the Black group, the mortality rates of Kern County(501/100,000 population) and Calaveras County(507.5/100,000 population) were relatively higher.
  • For the Asian and Pacific Islander group, the mortality rates of Lake County(313.5/100,000 population), Glenn County(259/100,000 population) and Tulare County(507.5/100,000 population) were relatively higher.
  • For the American Indian and Alaskan Native group, the mortality rates of Lake County(557.2/100,000 population), Colusa County(457.2/100,000 population) and Del Norte County(462.5/100,000 population) were relatively higher.

Conclusion

In this study, the pattern of heart disease death rate in CA during 2014 was examined under gender and race stratification. In general, the male had a higher death rates compared to the female. The possible explanation was different level of smoking and alcohol intakes and adaptive ablity facing stressed events. For the race stratification, the White, the Black and the American Indian and Alaskan Native groups had a higher death rates compared to the Hispanic and the Asian and Pacific Islander. The possible reason might be the hemoglobin level and discrepancies of CA health coverage. Based on the map distribution, there were several counties including Kern County, Tulare County and Stanislaus County, whose heart disease death rate was distinctly higher compared to the other counties in California. All of three counties had relatively higher obesity rate which was one of leading cause towards heart disease. The possible reason included unsafe environment for physical activity, increased age and poor access towards healthy food.


Download the Full report

The full PDF version could be downloaded here


Citation

  1. Centers for Disease Control and Prevention. Underlying Cause of Death, 1999–2018. CDC WONDER Online Database. Atlanta, GA: Centers for Disease Control and Prevention; 2018. Accessed March 12, 2020.
  2. Virani SS, Alonso A, Aparicio HJ, Benjamin EJ, Bittencourt MS, Callaway CW, et al. Heart disease and stroke statistics—2021 update: a report from the American Heart Associationexternal. Circulation. 2021;143:e254–e743.